Search CORE

36 research outputs found

A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion

Author: Nurminen Jani
Publication venue: Tampere University of Technology
Publication date: 01/01/2013
Field of study

During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices. This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method. The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec. Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing. The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications

Trepo - Institutional Repository of Tampere University

Voice Conversion

Author: Elina Helander
Hanna Silén
Jani Nurminen
Moncef Gabbouj
Victor Popa
Publication venue: 'IntechOpen'
Publication date: 14/03/2012
Field of study

IntechOpen

Efficient Gaussian Mixture Model Evaluation in Voice Conversion

Author: Jani Nurminen
Jilei Tian
Victor Popa
Publication venue
Publication date: 03/04/2020
Field of study

Abstract Voice conversion refers to the adaptation of the characteristics of a source speaker's voice to those of a target speaker. Gaussian mixture models (GMM) have been found to be efficient in the voice conversion task. The GMM parameters are estimated from a training set with the goal to minimize the mean squared error (MSE) between the transformed and target vectors. Obviously, the quality of the GMM model plays an important role in achieving better voice conversion quality. This paper presents a very efficient approach for the evaluation of GMM models directly from the model parameters without using any test data, facilitating the improvement of the transformation performance especially in the case of embedded implementations. Though the proposed approach can be used in any application that utilizes GMM based transformation, we take voice conversion as an example application throughout the paper. The proposed approach is experimented with in this context and evaluated against an MSE based evaluation method. The results show that the proposed method is in line with all subjective observations and MSE results

CiteSeerX

New Method for Delexicalization and its Application to Prosodic Tagging for Text-to-Speech Synthesis

Author: Alku Paavo
Järvikivi Juhani
Nurminen Jani
Raitio Tuomo
Suni Antti Santeri
Vainio Martti
Publication venue
Publication date: 01/01/2009
Field of study

This paper describes a new flexible delexicalization method based on glottal excited parametric speech synthesis scheme. The system utilizes inverse filtered glottal flow and all-pole modelling of the vocal tract. The method provides a possibil- ity to retain and manipulate all relevant prosodic features of any kind of speech. Most importantly, the features include voice quality, which has not been properly modeled in earlier delex- icalization methods. The functionality of the new method was tested in a prosodic tagging experiment aimed at providing word prominence data for a text-to-speech synthesis system. The ex- periment confirmed the usefulness of the method and further corroborated earlier evidence that linguistic factors influence the perception of prosodic prominence.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

MPG.PuRe

Salvage Magnetic Resonance Imaging–guided Transurethral Ultrasound Ablation for Localized Radiorecurrent Prostate Cancer: 12-Month Functional and Oncological Results

Author: Antti Viitala
Jani Saunavaara
Mikael Anttinen
Pekka Taimen
Pertti Nurminen
Peter J. Boström
Pietari Mäkelä
Roberto Blanco Sequeiros
Teija Sainio
Visa Suomi
Publication venue: 'Elsevier BV'
Publication date: 27/10/2022
Field of study

BackgroundUp to half of all men who undergo primary radiotherapy for localized prostate cancer (PCa) experience local recurrence.ObjectiveTo evaluate the safety and early functional and oncological outcomes of salvage magnetic resonance imaging–guided transurethral ultrasound ablation (sTULSA) for men with localized radiorecurrent PCa.Design, setting, and participantsThis prospective, single-center phase 1 study (NCT03350529) enrolled men with biopsy-proven localized PCa recurrence after radiotherapy. Multiparametric magnetic resonance imaging (mpMRI) and 18F prostate-specific membrane antigen-1007 (18F PSMA-1007) positron emission tomography (PET)-computed tomography (CT) were used to confirm organ-confined disease localization. Patients underwent either whole-gland or partial sTULSA, depending on their individual tumor characteristics.Outcome measurements and statistical analysisPatients were followed at 3-mo intervals. Adverse events (AEs, Clavien-Dindo scale), functional status questionnaires (Expanded Prostate Cancer Index [EPIC]-26, International Prostate Symptom Score, International Index of Erectile Function-5), uroflowmetry, and prostate-specific antigen (PSA) were assessed at every visit. Disease control was assessed at 1 yr using mpMRI and 18F-PSMA-1007 PET-CT, followed by prostate biopsies.Results and limitationsEleven patients (median age 69 yr, interquartile range [IQR] 68–74) underwent sTULSA (3 whole-gland, 8 partial sTULSA) and have completed 12-mo follow-up. Median PSA was 7.6 ng/ml (IQR 4.9–10) and the median time from initial PCa diagnosis to sTULSA was 11 yr (IQR 9.5–13). One grade 3 and three grade 2 AEs were reported, related to urinary retention and infection. Patients reported a modest degradation in functional status, most significantly a 20% decline in the EPIC-26 irritative/obstructive domain at 12 mo. A decline in maximum flow rate (24%) was also observed. At 1 yr, 10/11 patients were free of any PCa in the targeted ablation zone, with two out-of-field recurrences. Limitations include the nonrandomized design, limited sample size, and short-term oncological outcomes.ConclusionssTULSA appears to be safe and feasible for ablation of radiorecurrent PCa, offering encouraging preliminary oncological control.Patient summaryWe present safety and 1-yr functional and oncological outcomes of magnetic resonance imaging–guided transurethral ultrasound ablation (TULSA) as a salvage treatment for local prostate cancer recurrence after primary radiation. Salvage TULSA is safe and shows the ability to effectively ablate prostate cancer recurrence, with acceptable toxicity.</div

UTUPub

Palliative MRI-guided transurethral ultrasound ablation for symptomatic locally advanced prostate cancer

Author: Eemil Yli-Pietilä
Jani Saunavaara
Mikael Anttinen
Pekka Taimen
Pertti Nurminen
Peter J. Boström
Pietari Mäkelä
Roberto Blanco Sequeiros
Teija Sainio
Visa Suomi
Publication venue: 'Informa UK Limited'
Publication date: 28/10/2022
Field of study

Purpose Locally advanced prostate cancer can cause bladder outlet obstruction, gross hematuria and frequent hospitalization. While these complications are commonly treated by palliative transurethral resection of the prostate, the improvement is often insufficient. The purpose of this study was to evaluate the safety and feasibility of MRI-guided transurethral ultrasound ablation as an alternative palliative treatment option (pTULSA) for men suffering from symptomatic locally advanced prostate cancer. Methods This prospective, phase one study included 10 men in need of palliative surgical intervention due to urinary retention and gross hematuria caused by locally advanced prostate cancer. Patients were followed for 1 year at 3-month intervals. Time without catheter, time without hematuria, reduction in hospitalization time, and adverse events were measured. Results Ten patients with locally advanced prostate cancer were enrolled, all having continuous catheterization due to urinary retention and nine had gross hematuria before treatment. At 1 week post-pTULSA five patients were catheter-free. At last follow-up catheter-free and gross hematuria-free rates were 70% and 100%, respectively. Average hospitalization time from local complications reduced from 7.3 to 1.4 days in the 6 months before and after pTULSA. No > Grade 2 treatment related adverse events were reported, with all five being urinary tract infections. Conclusions pTULSA appears safe and feasible for palliative ablation of locally advanced prostate cancer. The therapy seems to accomplish long-term hematuria control, can relieve bladder outlet obstruction in selected patients, and seems to reduce the burden of hospitalization due to local complications.</div

UTUPub

Lähtevän prosessin läpimenoajan pienentäminen : valmistuotevarasto

Author: Nurminen Jani
Publication venue
Publication date: 01/01/2022
Field of study

Opinnäytetyö toteutettiin Kiilto Oy:n halusta kehittää Lempäälän toimipisteen valmistuotevarastoa. Tilaus määrien kasvettua valmistuotevaraston lähtevän prosessin tehokkuus ei ollut enää vaadittavalla tasolla vastaamaan kysyntään. Opinnäytetyön tarkoituksena oli kartoittaa valmistuotevaraston nykytilanne sekä kehitysehdotusten avulla pienentää lähtevän prosessin läpimenoaikaa, kehittämällä keräilyä, pakkausta ja muita toimintoja. Varaston toiminnan nykytilasta kerättiin tietoa työskentelemällä keräilyssä, havainnoimalla, työntekijöiden haastatteluilla, sekä yritykseltä saatujen materiaalien avulla. Nykytilanteen selvityksen avulla varaston toiminnasta nostettiin esille ongelmakohdat. Kirjallisuuskatsaus kirjoitettiin ongelmakohtien kehittämisen perustaksi. Kirjallisuuskatsauksen aiheet käsittelivät sisälogistiikkaa, lähtevää prosessia sekä varastonohjausta ja materiaalivirtoja. Tutkimuksella tuotettiin ehdotukset siirtää keruupaikkojen täydentäminen keräilijöiden tehtävänkuvauksesta varastojärjestelijöiden tehtävänkuvaukseen, lisätä keräilytarraan paikkansapitävät tiedot, sekä keräilytarratulostimien siirtäminen parempiin paikkoihin. Pakkausta kehitettiin vähentämällä pakkauskoneelle käytettävää jonotusaikaa ja sujuvoittamalla itse työtä. Hyödynnettiin muiden toimintojen tehostamista koulutusten ja lean – ajattelun avulla. Työllä poistettiin tuottamatonta työtä kaikista lähtevän prosessin osa-alueista. Yritykselle on annettu lähtökohdat kehittyä toiminnassaan nykytilan selvityksen ja kehitysehdotusten avulla. Kehitysehdotusten käytäntöönpano on kiinni yrityksen omasta halusta kehittyä

Theseus

Conceptual model of environmental management system (EMS) of reversed information streams

Author: Eva Pongrácz
Jani Nurminen
Publication venue
Publication date
Field of study

In this paper, an approach to conceptualize environmental management systems is studied using an object-oriented model of waste management theory which is modelled using the PSSP 1 language. The use of the PSSP methodology in creating a conceptual model of environmental management systems is illustrated by modeling the new theory of waste management and conceptualising it into a functional environmental management system. Furthermore, a conceptual model of an environmental management system of reversed information streams is introduced. 1

CiteSeerX

A Novel Technique for Voice Conversion Based on Style and Content Decomposition with Bilinear Models

Author: Jani Nurminen
Moncef Gabbouj
Victor Popa
Publication venue
Publication date: 01/01/2009
Field of study

This paper presents a novel technique for voice conversion by solving a two-factor task using bilinear models. The spectral content of the speech represented as line spectral frequencies is separated into so-called style and content parameterizations using a framework proposed in [1]. This formulation of the voice conversion problem in terms of style and content offers a flexible representation of factor interactions and facilitates the use of efficient training algorithms based on singular value decomposition and expectation maximization. Promising results in a comparison with the traditional Gaussian mixture model based method indicate increased robustness with small training sets. 1

CiteSeerX

Hong Kong University of Science and Technology Institutional Repository